Risk Evaluation of Bone Metastases and a Simple Tool for Detecting Bone Metastases in Prostate Cancer: A Population-Based Study

Introduction Population-based estimates of the incidence and prognosis of bone metastases in prostate cancer (PC) are lacking. We aimed to characterize the incidence and risk of bone metastases and develop a simple tool for the prediction of bone metastases among patients with PC. Methods Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. A total of 75698 patients with PC with confirmed presence or absence of bone metastases at diagnosis between 1975 and 2019 in the United States were used for analysis. Data were stratified by age, race, residence, median income, prostate-specific antigen (PSA) values, tumor size, distant metastatic history, and positive lymph node scores. Multivariable logistic and Cox regressions were performed to identify predictors of bone metastases and factors correlated with all-cause mortality. Classification tree analysis was performed to establish a model. Results After patients with PC with missing data were excluded, 75698 cases remained. Among these, 3835 patients had bone metastases. Incidence proportions were highest in patients with a high prostate-specific antigen (PSA) value (odds ratio (OR), 2.49; 95% confidence interval (CI), 1.35-4.35; p < 0.002). Multivariable Cox regression and risk analyses indicated that high PSA values (hazards ratio (HR), 19.8; 95% CI, 18.5-21.2; p < 0.001) and high positive lymph node scores (vs. score 0; HR, 8.65; 95% CI, 7.89-9.49; p < 0.001) were significant risk factors for mortality. Meanwhile, in the predication tree analysis, PSA values and lymph node scores were the most significant determining factors in two models. Median survival among the patients with PC was 78 months, but only 31 months among those with bone metastases. Conclusion Patients with PC with high PSA values or high positive lymph node scores were at a significantly higher risk of bone metastases. Our study may provide a simple and accurate tool to identify patients with PC at high risk of bone metastases based on population-based estimates.


Introduction
Prostate cancer (PC) is one of the most common malignant diseases globally; it was the fourth most commonly diagnosed cancer in both sexes and the second most frequently occurring cancer in males in 2020 [1]. With the development of new treatment technology, especially the robot-assisted surgery system, the survival of patients with PC has increased dramat-ically in recent years. But the prevalence of distant metastasis has also increased during this process [2]. Bone is the most frequent distant metastasis site of PC. Bone metastases can result in bone destruction and lead to considerable disability and impaired quality of life [3]. Furthermore, patients with PC with bone metastases have a much poorer prognosis and notably higher mortality than those without bone metastases. Only 3% of patients with bone metastases survive after five years [4]. Therefore, detecting bone metastasis or accurately evaluating its risk in early PC diagnosis is critical.
At present, urologists assess the risk of bone metastases based on their own clinical experience or single diagnostic examination result, such as positron emission tomography/ computed tomography (PET/CT) [5]. Some surgeons have also tried to establish a tool to help them assess the risk of bone metastases, but most of these studies are based on a single institution or their own experiences [6,7]. Large population-based studies that focus on the risk assessment of bone metastases in PC are deficient. Furthermore, no study has been published suggesting a simple tool specifically for the risk assessment of bone metastases in patients with PC. Therefore, it is important to build an easy and reliable predictive tool. With the development of computer science, machine learning has become widely used in medical and healthcare applications, especially in the development of new diagnostic or prediction systems [8]. This phenomenon accelerated and became more widely used after the breakout of the coronavirus disease 2019 pandemic (COVID-19) [9]. The Surveillance, Epidemiology, and End Results (SEER) database provides detailed information on cancer statistics for the United States (US) population [10]. SEER is widely used for deep matching learning and seeking new diagnostic methods [11].
The objective of this study was to assess the basic clinical characteristics in patients with PC with or without bone metastases. Furthermore, we aimed to evaluate the survival details and investigate both clinical and sociodemographic predictors of bone metastasis among patients with PC. Finally, we used these data to develop and evaluate a simple tool to predict the risk of bone metastases that can be used in daily clinical work.

Methods
2.1. Database. The SEER database includes cancer incidence data from large-scale population-based cancer registries covering approximately 47.9% of the US population from 1975 to 2019. Information on the presence or absence of distant metastases at the time of first PC diagnosis was released from January 1, 2010, to December 31, 2015. In addition, the database collects patient demographics such as age, race, and sex, as well as clinical information like primary tumor site, tumor morphology, stage at diagnosis, first course of treatment, and follow-ups with patients for vital status. We identified 90812 cases of patients who were diagnosed as having PC with or without bone metastases at the first step.

Study Population.
We recruited patients diagnosed with PC (International Classification of Diseases-(ICD-) 0-3/ World Health Organization (WHO) 2008: "Prostate"; histology recode: 8140-8339) between January 1, 2010, and December 31, 2015 ( Figure 1). Then, we chose to include the age at diagnosis, race, median household income, residence, prostate-specific antigen (PSA) value, metastases diagnosis, tumor size, lymph node diagnosis, survival follow-up, and death classification for evaluation in our study. Among these patients with PC, we excluded 15114 cases where important clinical or basic information was unknown or missing. After exclusion, 75698 cases were included in the final cohort for further analysis.

Definition of Incidence.
Patients were stratified by their PSA values: PSA < 10 was defined as the low-risk group, 10 ≦ PSA < 10-20 was defined as the intermediate risk group, and PSA ≥ 20 was defined as the high risk group. Incidence proportion was also assessed among patients with positive lymph node scores or metastases to distant sites. Absolute numbers and incidence proportions were calculated for patients with PC with bone metastases or other distant metastases at initial diagnosis. Age, race, household income, and place of residence were also evaluated to determine whether baseline characteristics can affect these factors. The baseline or clinical information was categorized according to the SEER database.
We used multivariable logistic regression to determine these baseline indexes, including age, race, residence type, and household income, between patients with PC and PC cases with bone metastases. In addition, relevant disease information, including PSA value, tumor size, lymph node index, and the presence of bone, lung, liver, or brain metastases, was available in the SEER database. This disease information was used to characterize the extent of systemic disease. For multivariable logistic regression analyses, we used the "Stas19" package and "glm" function in R software (version 4.2.0.; R Core Team, Vienna, Austria).
The Kaplan-Meier method was used for survival analyses and drawing the survival curve. In addition, multivariable Cox regression was carried out to evaluate the hazard ratio (HR) in different subgroups among patients with PC, with or without bone metastases. This analysis helped us to identify variates correlated with increased all-cause mortality. The "ezcox" and "survival" packages in R software were used to perform these analyses.
We also built a model via the classification tree method using the "rpart" package of R software. Tree classification is based on nonlinear discrimination analyses. First, it splits a data sample into different subgroups using independent variables. Then, the dependent factors most strongly associated with the independent variable were identified using R 2 Computational and Mathematical Methods in Medicine software. Bone metastasis was the independent variable, and clinical or sociodemographic characteristics were the dependent variables. To guarantee the reliability of the tree model, we devised two different methods to detect the mean decrease Gini (MDG). To further confirm our conclusion, we randomly collected 300 cases from the data as a subgroup. For this subgroup, we carried out a decision tree classification model using the software package SPSS 18.0 (SPSS Inc., Chicago, IL, USA). Living in rural areas, a lower household income, a larger tumor size, and a higher positive lymph node score were positively correlated with the incidence of bone metastases; however, these factors were not significant. Neither age at diagnosis nor race was significantly associated with an increased risk of bone metastases in PC cases. These results are presented in Table 2.

Survival Analyses in Patients with PC with Bone
Metastases. As stratified by PSA values, the median survival times among patients with PC (n = 75698) and patients with bone metastases (n = 3835) are shown in Table 1. In both the total PC cases and the bone metastases subgroup, patients with low PSA values had a longer median survival compared with patients with higher PSA values. Furthermore, survival in the overall patients with PC cohort (Figure 2(a)) was much longer than that of the metastasis cases ( Figure 2(b)). In bone metastasis cases, overall survival estimates ( Figure 2

Multivariable Cox Regression and Risk Analyses for the
Bone Metastasis Incidence in Patients with PC. Table 3 shows the results of multivariable Cox regression for all-cause mortality among patients with PC with bone metastases. In all patients with PC, the risk of all-cause mortality was significantly higher in the following categories: (       Neither race nor income was significantly associated with elevated mortality for all patients with PC.
In patients with PC with bone metastases, the risk of allcause mortality was significantly higher in the following categories: ( Considering that the number of cases with bone metastases was limited, no significance was found between mortality and factors such as age, race, residence district, income, or tumor size. In general, poorer survival prognosis was correlated with a series of clinical and sociodemographic factors, especially PSA values, metastases, and positive lymph node scores. Furthermore, we applied these clinical and sociodemographic characteristics for deeper analyses using the random forest method and decision trees to explore the association between these factors and bone metastasis risk. Using two different models in the random forest algorithm, we observed that the PSA value had the largest MDG. The MDG expression was 1731.99 in the type 1 model (Figure 3(a)) and 134.00 in the type 2 model (Figure 3(b)). Following the PSA value, the lymph node score also had a relatively higher MDG expression than the other factors in both models. These results indicated that the PSA value and lymph node score at diagnosis had the greatest association with the incidence of bone metastasis in patients with PC.
The decision tree illustrates that among all the clinical and nonclinical parameters in our analyses, high PSA values and positive lymph node scores could be good indexes for predicting bone metastases in patients with PC (Figure 3(c)). High PSA was the most important determining factor, which was the first-level split of two initial branches of the tree. The accuracy of the tree is 95.90%. To further guarantee accuracy, 300 cases were randomly collected as a subtype group from our data. Decision tree classification was carried out on this 300-case subgroup, and it reconfirmed that a higher PSA value was a critical factor indicating bone metastases (Supplementary Figure 1A, B).

Discussion
The number of patients with PC is increasing worldwide, and the prognosis of patients with PC varies according to factors such as age and race [12]. PC is prone to distant metastasis, leading to a very poor prognosis. Among these metastases, bone metastasis accounts for the largest proportion [3]. More than half of patients already have bone metastases at their initial diagnosis [13].
In this study, we described the incidence of identified bone metastases among patients with newly diagnosed PC and evaluated the patients' risk and survival characteristics based on the SEER database, a large population-based database. Due to the fact that the SEER database only provided bone metastasis information at the initially diagnosis, its incidence is likely to have been underestimated. However, large data still ensures the reliability and accuracy of our results. A higher PSA value was significantly correlated with a higher incidence of bone metastases and indicated a poorer survival prognosis. Median survival time ranged from 57 months in patients with high PSA values to 82 months in patients with low PSA values. Based on these results, a tree model was established to assess the bone metastasis risk, which could help doctors identify those at a high risk.
Bone is one of the most common organs affected by metastases in human cancers, especially in lung cancer, PC, and breast cancer [14]. Compared with patients with PC without bone metastasis, cases with bone metastasis have significantly shorter survival (2.2 years vs. 3.5 years) and higher mortality (73% vs. 19%). Bone metastases serve as a major cause of death in patients with PC [15]. Furthermore, most cases of bone metastasis are also castration-resistant. Therefore, there are no effective treatments for these cases [16].
The PSA test is a risk assessment tool for PC diagnosis and prognosis evaluation [17]. Some recent studies have used the nadir PSA level and time to nadir for prognosis assessment in castration-resistant patients with PC [18]. In most of these studies, incidence and survival results were not stratified by PSA values or other subtypes. Some urologists also use advanced medical examination such PET-CT or bone scintigraphy to evaluate the risk of bone metastases in patients with PC [19]. Most of these examinations are expensive, and patients are often not willing to take these tests in the initial stages of their disease. Therefore, developing a simple and easy tool is important for patients with PC, especially for financially disadvantaged patients. With the development of artificial intelligence technology, machine learning can help us to develop new and easy diagnostic or predictive tools in clinical work, based on population-based big data. It has been applied in many common diseases, such as tuberculosis [20] and Alzheimer's disease [21]. Furthermore, some physician scientists use machine deep learning to pioneer the next generation of medical robotics [22].
A weakness of our study is that information about bone metastases was available at first presentation rather than at any time over the disease course. But the SEER database provides a population-based big data information in a long time from 1975 to 2019. It can ensure accuracy and reliability of our conclusion in real work. In our study, we focused on the association between the risk of bone metastases and PSA values as well as other baseline or clinical characteristics. We observed that 5.07% of patients with PC with bone metastases at initial diagnosis and 97.8% of cases with metastatic PC cancer at any distant site had bone metastases. Among the entire cohort, we found that PC cases with high PSA values (vs. low PSA cases) had significantly higher odds of suffering bone metastases at initial diagnosis. Furthermore, our data also indicated that patients with PC who lived in rural areas and had lower household income, larger tumor sizes, or higher positive lymph node scores were positively correlated with the incidence of bone metastases; however, these factors were not statistically significant. Many studies have indicated that clinical and socioeconomic factors may affect survival among patients with malignancies [23,24]. In our study, the median survival for patients with PC was 78 months and only 31 months among patients with bone metastases. Our data indicated that the HR varied by subtypes in all patients with PC and in those with bone metastases. Patients with PC with high PSA values had the highest HR value (vs. low PSA: HR, 19.8; 95% CI, 18.5-21.2; p < 0:001 in all PC cases and in bone metastasis cases, vs. low PSA: HR, 1.6; 95% CI, 1.37-1.86; p < 0:001). Patients with high PSA values had the worst survival in bone metastasis cases. Besides the PSA value, we also found that a higher positive lymph node score was positively correlated with higher HR values. Our results are consistent with some general trends reported in previous studies [25,26].
Meanwhile, other baseline characteristics, including age, income, tumor size, and residence area, also affected the HR values. We constructed a decision tree model based on the HR and conducted survival analyses for bone metastasis prediction in patients with newly diagnosed PC. In this model, we found that patients with high PSA scores and positive lymph nodes required close attention to the possibility of bone metastases during their follow-up medical evaluations. In comparison to other evaluation methods in clinical work,   Figure 3: Mean decrease Gini (MDG) expression in two different models using random forest algorithm and prediction tree model for bone metastases. In either the (a) type 1 model or (b) the type 2 model, the PSA value had the largest MDG, followed by positive lymph node scores. (c) Classification tree for predicting bone metastases among patients with PC. the prediction tree model is the most convenient for urologists and researchers. The tree is easily processed from the root to the terminal nodes through several median branches. This process depends on the simple choice question of "yes" or "no." It is much easier for clinical workers to analyze daily clinical information or samples from big public databases compared to other complex mathematical methods [27].

Strengths and Limitations
The strengths of our study are that our sample was large and included diverse population-based cases and detailed clinical information obtained directly from surgeons and physicians which guarantees the research's reliability. Meanwhile, the predication tool is simple and reliable for urologists to assess the risks of bone metastases of their patients with PC quickly and noninvasively.
However, several limitations should be considered. The primary limitation is that most of the information on these patients with PC that were provided by the SEER database was obtained during the first treatment in the hospital; hence, patients who subsequently developed bone metastasis later in the course of their disease could not be included in our evaluation. This is an important limitation that should be considered by other researchers in the future. Also, the median residence type and median household income in SEER were defined at a county level, not a personal level. To a certain extent, this may affect our analyses. Furthermore, we have no data on other important characteristics such as patients' smoking status, alcohol use, and reasons why patients and their doctors chose a particular treatment. Finally, the SEER database does not provide information on the treatment that these patients with PC received.

Conclusions
Considering these limitations, we will consider building our own PC clinical database. Our database will include some valuable treatment information and follow-up data after surgery, as well as the basic factors provided by the SEER database. We believe it will help urologists to evaluate the prognosis, including the risk of bone metastases after surgery treatment. It could be a meaningful addition to the simple tool we developed in this study. Big public databases can guarantee the reliability of this study; however, it may lack some valuable clinical information. Finally, we will integrate our own database with the big public database to carry out further analysis and future study.
Our research explores the epidemiology of bone metastases in patients with PC in the US. Patients with PC with high PSA values, lower household income, larger tumor size, and high positive lymph node scores are more likely to have poor prognosis and survival. Moreover, patients with high PSA values are at a significantly higher risk of bone metastases. In addition, high positive lymph node scores also indicate a high risk of bone metastases among patients with PC. A simple tree model for prediction of bone metastases among patients with PC has been created based on these findings. Our research provides useful information for uro-logic clinical research, as well as an easy and simple diagnostic tool for urologists and patients with PC in clinical diagnosis and treatment.

Data Availability
Data analyzed in the current study are available from the corresponding authors on reasonable request.